Goto

Collaborating Authors

 oracle-efficient pac rl


On Oracle-Efficient PAC RL with Rich Observations

Neural Information Processing Systems

We study the computational tractability of PAC reinforcement learning with rich observations. We present new provably sample-efficient algorithms for environments with deterministic hidden state dynamics and stochastic rich observations. These methods operate in an oracle model of computation -- accessing policy and value function classes exclusively through standard optimization primitives -- and therefore represent computationally efficient alternatives to prior algorithms that require enumeration. With stochastic hidden state dynamics, we prove that the only known sample-efficient algorithm, OLIVE, cannot be implemented in the oracle model. We also present several examples that illustrate fundamental challenges of tractable PAC reinforcement learning in such general settings.


Reviews: On Oracle-Efficient PAC RL with Rich Observations

Neural Information Processing Systems

Moreover, the reward depends only on x_t and the action, not the state S_t. They then correctly state (again, lines 99-100) that this makes the problem an MDP over X. It argues "The hidden states serve to introduce structure to the MDP and enable tractable learning." I don't understand why this is the case.


On Oracle-Efficient PAC RL with Rich Observations

Neural Information Processing Systems

We study the computational tractability of PAC reinforcement learning with rich observations. We present new provably sample-efficient algorithms for environments with deterministic hidden state dynamics and stochastic rich observations. These methods operate in an oracle model of computation -- accessing policy and value function classes exclusively through standard optimization primitives -- and therefore represent computationally efficient alternatives to prior algorithms that require enumeration. With stochastic hidden state dynamics, we prove that the only known sample-efficient algorithm, OLIVE, cannot be implemented in the oracle model. We also present several examples that illustrate fundamental challenges of tractable PAC reinforcement learning in such general settings.